Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support DensityInterface API #1416

Merged
merged 15 commits into from
Nov 9, 2021
Merged

Conversation

oschulz
Copy link
Contributor

@oschulz oschulz commented Nov 5, 2021

Implements DensityInterface.hasdensity(::Distribution) and DensityInterface.logdensityof(d::Distribution, x).

This will form a basis for using distributions in generic code that needs to handle distributions as well as (non-normalized) densities.

CC @cscherrer, @devmotion, @mschauer, @phipsgabler, @tpapp.

@codecov-commenter
Copy link

codecov-commenter commented Nov 5, 2021

Codecov Report

Merging #1416 (e9242a8) into master (d8380fc) will increase coverage by 0.04%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1416      +/-   ##
==========================================
+ Coverage   83.60%   83.64%   +0.04%     
==========================================
  Files         118      119       +1     
  Lines        6891     6899       +8     
==========================================
+ Hits         5761     5771      +10     
+ Misses       1130     1128       -2     
Impacted Files Coverage Δ
src/density_interface.jl 100.00% <100.00%> (ø)
src/multivariate/mvnormal.jl 75.51% <0.00%> (+0.51%) ⬆️
src/matrix/matrixreshaped.jl 95.83% <0.00%> (+4.16%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d8380fc...e9242a8. Read the comment docs.

@oschulz
Copy link
Contributor Author

oschulz commented Nov 5, 2021

Failures on nightly seem unrelated.

@mschauer
Copy link
Member

mschauer commented Nov 5, 2021

Can you write more about what this would be good for, maybe a specific use case, also compared to the alternative of just creating an anonymous function x -> logpdf(MyObject, x) and passing that around?

@oschulz
Copy link
Contributor Author

oschulz commented Nov 5, 2021

Can you write more about what this would be good for, maybe a specific use case

Sure. I think @phipsgabler put it very nicely in TuringLang/AbstractPPL.jl#38: "

  • to avoid the IMHO artificial distiction between "PDFs" and "PMFs",
  • to use an appropriate term for densities which are not probability densities, like (unnormalized) posteriors and likelihoods, and
  • to have a generic hypernym covering loglikelihood, logprior, etc., which are all expressible in terms of logdensity of differently conditioned models.

All of this works much better, of course, if Distributions takes part in it. DensityInterface is about things that are a density or can be said to have a density. Distributions definitely do have a (probability) density of course. Even discrete distributions can be said to have a density in respect to a natural base measure (the counting measure) - which corresponds exactly to what logpdf(d, x) calculates at the moment.

compared to the alternative of just creating an anonymous function x -> logpdf(MyObject, x) and passing that around

Using plain log-density functions can be very awkward at higher levels of code. To illustrate, in Julia examples for Bayesian analysis I used to write log_likelihood = some_func; prior = SomeDistribution() - only to be asked by people "why one with log_ and not the other?". It just doesn't read well.

On the other hand, likelihood = some_func; prior = SomeDistribution() is misleading, dangerous and wrong, if some_func returns the log of the likelihood. And using log_likelihood = some_func; log_prior = x -> logpdf(SomeDistribution(), x) also won't work in many cases, since important information is lost: An algorithm can't sample log-prior function to get starting values, for example.

But likelihood = logfuncdensity(some_func); prior = SomeDistribution() is clear to the reader and safe: the code using likelihood and prior (and the posterior, and so on) can use logdensityof and be sure to get the log. And it more complex use cases it will often be likelihood = SomeLikelihood(data), an object that can be passed around without loss of information (can still acces the data at any time as well as calculate it's (log)-density value.

There's also the fact that one doesn't always want the log value of a density: When integrating, for example, the algorithm typically needs the non-log values of the density. In many cases, it will be fine to use exp(log_of_my_density) - this is why DensityInterface.densityof(d, x) defaults to exp(DensityInterface.logdensityof(d, x)), just like Distributions.pdf(d, x) defaults to exp(Distributions.logpdf(d, x)). But there are also densities for which it is faster or more exact to calculate the non-log value of the density directly (and so DensityInterface.densityof and Distributions.pdf can be specialized). But this doesn't work when passing plain log-density functions around directly.

DensityInterface is not limited to Bayesian applications of course - in fact, it's intended to be very general. The current API originated through (I think very productive) discussions (JuliaMath/DensityInterface.jl#1, JuliaMath/DensityInterface.jl#3, JuliaMath/DensityInterface.jl#4) among several people who have a an interest in (log-)densities in the Julia community and are involved in packages that would profit from this. Maintainers of Distributions were also involved. Having Distributions support DensityInterface was an important aspect of the design.

DensityInterface is an extremely lightweight package with no dependencies beyond Test and InverseFunctions (which is extremely lightweight itself and introduces no further dependencies), so there should be no impact on the load time of Distributions (a @time using Distributions test resulted in aload time of 1.05s before and 0.99s after this PR, so no significant effect).

@oschulz
Copy link
Contributor Author

oschulz commented Nov 5, 2021

Added DensityInterface.densityof(d::Distribution, x) = pdf(d, x).

I think this is complete now.

src/density_interface.jl Outdated Show resolved Hide resolved
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
@oschulz
Copy link
Contributor Author

oschulz commented Nov 5, 2021

@devmotion, now that we have DensityInterface.logdensityof(d::Distribution, x) = loglikelihood(d, x) should DensityInterface.densityof(d::Distribution, x) call exp(loglikelihood(d, x)) or pdf(d, x)?

@phipsgabler
Copy link
Contributor

I'd say specialize to Distributions.pdf. Many distributions probably will then do the exp thing internally anyway, but maybe some can have simpler implementations. It also gives Distributions.jl the full control and gets rid of possible numerical differences between pdf and densityof. (There's no extra pmf, right?)

Something related: I remember one bug originating from a distribution which did it the other way round and implemented logpdf in terms of pdf, which caused a stack overflow at some place. Could that be a problem?

@oschulz
Copy link
Contributor Author

oschulz commented Nov 5, 2021

I'd say specialize to Distributions.pdf

Seems more natural to me as well (it's what this PR currently does).

@devmotion
Copy link
Member

should DensityInterface.densityof(d::Distribution, x) call exp(loglikelihood(d, x)) or pdf(d, x)

It should call likelihood(d, x) ideally which has the same properties as loglikelihood but returns exp(loglikelihood(d, x)). Unfortunately, such a method does not exist yet 😄 So I think it is better to use the default fallback for now (i.e., exp(loglikelihood(d, x)) but not defining it) since otherwise densityof and logdensityof would be very inconsistent. pdf and loglikelihood return different types for some inputs (pdf may also return arrays) and support different inputs.

Something related: I remember one bug originating from a distribution which did it the other way round and implemented logpdf in terms of pdf, which caused a stack overflow at some place. Could that be a problem?

I think I fixed all of these potential StackOverflowErrors, the dispatching should be much more sane nowadays.

@oschulz
Copy link
Contributor Author

oschulz commented Nov 5, 2021

So I think it is better to use the default fallback for now

Ok, I took the specialization of DensityInterface.densityof out and left a comment about this.

@oschulz
Copy link
Contributor Author

oschulz commented Nov 7, 2021

@mschauer ,@devmotion, is this fine like this from your side?

test/density_interface.jl Outdated Show resolved Hide resolved
src/density_interface.jl Outdated Show resolved Hide resolved
Also make logfuncdensity call logpdf for distributions.
@oschulz
Copy link
Contributor Author

oschulz commented Nov 7, 2021

CI will fail, requires JuliaMath/DensityInterface.jl#6

@oschulz
Copy link
Contributor Author

oschulz commented Nov 7, 2021

Have to wait for JuliaRegistries/General#48369

@oschulz
Copy link
Contributor Author

oschulz commented Nov 7, 2021

Oh, it did pass on Julia v1.0 and v1.6, actually. It failed for me locally on v1.7 without JuliaMath/DensityInterface.jl#6. But you can review the changes already. :-)

@oschulz
Copy link
Contributor Author

oschulz commented Nov 8, 2021

Test failures on nightly seem unrelated. Ready for a fresh review round. :-)

src/density_interface.jl Outdated Show resolved Hide resolved
docs/src/density_interface.md Outdated Show resolved Hide resolved
docs/src/density_interface.md Outdated Show resolved Hide resolved
src/density_interface.jl Outdated Show resolved Hide resolved
src/density_interface.jl Outdated Show resolved Hide resolved
@mschauer
Copy link
Member

mschauer commented Nov 9, 2021

I find Distribution's loglikelihood accepting both a single sample and multiple samples a code smell. Is the single-sample version used in Turing? We might not want to support the single sample form for "IIDDensity" it if not.

@oschulz
Copy link
Contributor Author

oschulz commented Nov 9, 2021

Is the single-sample version used in Turing

I has similar concerns, but from what I understood Turing actually depends on having densities that will work for both single and multiple samples - correct, @devmotion ?

src/density_interface.jl Outdated Show resolved Hide resolved
@devmotion
Copy link
Member

I has similar concerns, but from what I understood Turing actually depends on having densities that will work for both single and multiple samples - correct, @devmotion ?

Yes, in expressions of the form x ~ dist where x is some data/observation(s)/... that is assumed to be i.i.d. according to distribution dist we evaluate just loglikelihood(dist, x) (of course, the computation is skipped if we only compute the prior probabilities). IMO it is a feature that loglikelihood accepts both a single and multiple samples - the only assumption of loglikelihood is that the data is iid distributed but the number of samples in the data set is arbitrary. And why should one have to artificially wrap a single sample in a container with one element or have to define custom dispatches to logpdf?

Also implementation-wise there are no ambiguities as it is clear from the type of the distribution if x is a single sample (then one can just fall back to logpdf), an array of samples, or another supported collection of samples. It only becomes confusing and difficult to implement if it is left for other packages to determine if x is suitable for dist, if it is a single sample or if it is a suitable collection of samples: then they have to start dispatching on the distribution type which requires knowledge about and synchronization with implementation details in Distributions and other packages that define distributions. Moreover, it means that only distributions and collections that are known when the dispatches are defined are supported. Therefore in my opinion the current loglikelihood interface is very valuable and removing support for single samples would be a major regression.

@oschulz
Copy link
Contributor Author

oschulz commented Nov 9, 2021

I find Distribution's loglikelihood accepting both a single sample and multiple samples a code smell.

I'm also ambivalent about it. But one could argue that things like broadcast also allow for using a scalar instead of an array with a single element.

@devmotion
Copy link
Member

Maybe we could do the following, as I'm also not particularly satisfied with the slightly unusual design of IIDDensity and it's limitations but we need something that behaves like loglikelihood (I still think it's a feature that it supports both single and multiple samples - why shouldn't it be possible to evaluate the log-likelihood of a single observation or alternative why should one have to wrap it in a container? I think the broadcast argument/view is quite fitting):

  • Only define the standard logdensityof and densityof for Distribution in this PR, with the errors for cases with multiple samples for which logpdf and pdf are defined in Distributions (in all other cases it will error automatically)
  • Define an internal IIDDensity in DynamicPPL to get the loglikelihood behaviour in Turing for Distributions (this PR shows this can be done in a very simple way) until there is a better/different solution (e.g. based on the refactored Product)

@oschulz
Copy link
Contributor Author

oschulz commented Nov 9, 2021

Define an internal IIDDensity in DynamicPPL

So we just move IIDDensity to DynamicPPL? Sounds good.

@oschulz
Copy link
Contributor Author

oschulz commented Nov 9, 2021

Ok, here's our current IIDDensity implementation, to make it easy to find for DynamicPPL later on:

"""
    IIDDensity(d::Distribution)

Represents the probability density of an implicit product distribution of
variates that are identically and independently distributed according to
the distribution `d`.

Use `DensityInterface.logdensityof(d, x)` to compute the logarithmic density
value at `x`. `x` may be a single variate of `d` or a whole set of variates
of `d`.

If `x` is a single variate of `d`, the density is the PDF of `d`.

If `x` is a set of variates (given as a higher-dimensional array or
and array of arrays), the density is the PDF of an implicit product
distribution over `d`, the size of the product is implied by the size of
the set.

`DensityInterface.logdensityof(IIDDensity(d::Distribution), x)` is equivalent
to `loglikelihood(d, x)`.
"""
struct IIDDensity{D<:Distribution}
    distribution::D
end

@inline DensityInterface.hasdensity(d::IIDDensity) = true

# ToDo: Move documentation of behavior of `logdensityof(d::IIDDensity, x)` to
# a method docstring of logdensityof here if/when IIDDensity becomes exported.
DensityInterface.logdensityof(d::IIDDensity, x) = loglikelihood(d.distribution, x)

# =================================================

@testset "IIDDensity" begin
    using DensityInterface

    d_mv = MvNormal([2.3 0.4; 0.4 1.2])
    x = [rand(d_mv) for i in 1:3]
    ref_logd_at_x = loglikelihood(d_mv, x)
    d = Distributions.IIDDensity(d_mv)

    DensityInterface.test_density_interface(d, x, ref_logd_at_x)
    DensityInterface.test_density_interface(d, hcat(x...), ref_logd_at_x)

    # Stricter than required by test_density_interface:
    @test logfuncdensity(logdensityof(d)) === d
end

We also had this text in the docs

You can use

DensityInterface.logdensityof(Distributions.IIDDensity(d), x)

and

DensityInterface.densityof(Distributions.IIDDensity(d), x)

to evaluate the log-density and density of variate value(s) x that are assumed to be identically and independently
distributed according to distribution d. Here x can be either a single variate value or at a set of variate values.
DensityInferface.logdensityof(Distributions.IIDDensity(d), x) falls back to loglikelihood(d, x).

@oschulz
Copy link
Contributor Author

oschulz commented Nov 9, 2021

Ok, IIDDensity is gone ... bye for now, IIDDensity, see you again in DynamicPPL. :-)

@oschulz
Copy link
Contributor Author

oschulz commented Nov 9, 2021

@devmotion and @mschauer, Ok with you in the new state?

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, can you update the version number?

@oschulz
Copy link
Contributor Author

oschulz commented Nov 9, 2021

Looks good to me, can you update the version number?

Done. Let's wait until we've resolved the last open issue (question regarding return type of hasdensity) though.

@oschulz
Copy link
Contributor Author

oschulz commented Nov 9, 2021

Ok, I think we're through, all good from my side.

@mschauer mschauer merged commit 2fbe852 into JuliaStats:master Nov 9, 2021
@oschulz oschulz deleted the density-interface branch November 9, 2021 16:33
@mschauer
Copy link
Member

mschauer commented Nov 9, 2021

I merged this, trusting that we revisit how to allow the DensityInterface to preserve and make available within the interface some of the information and context which defines what kind of object pdf actually returns and which is available through our type parameters.

@oschulz
Copy link
Contributor Author

oschulz commented Nov 9, 2021

that we revisit how to allow the DensityInterface to preserve and make available within the interface some of the information and context which defines what kind of object pdf actually returns

Absolutely. From my side, DensityInterface is a beginning, not the end of this story.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants